A semi-supervised approach to question classification

نویسندگان

  • David Tomás
  • Claudio Giuliano
چکیده

This paper presents a machine learning approach to question classification. We have defined a kernel function based on latent semantic information acquired from unlabeled data. This kernel allows including external semantic knowledge into the supervised learning process. We have combined this knowledge with a bag-of-words approach by means of composite kernels to obtain state-of-the-art results. As the semantic information is acquired from unlabeled text, our system can be easily adapted to different languages and domains.

منابع مشابه

Semi-Supervised Learning Based Prediction of Musculoskeletal Disorder Risk

This study explores a semi-supervised classification approach using random forest as a base classifier to classify the low-back disorders (LBDs) risk associated with the industrial jobs. Semi-supervised classification approach uses unlabeled data together with the small number of labelled data to create a better classifier. The results obtained by the proposed approach are compared with those o...

متن کامل

Identifying Cores of Semantic Classes in Unstructured Text with a Semi-supervised Learning Approach

Cores of semantic classes in scenario descriptions can be extremely valuable in question-answering, information extraction, and document retrieval. We propose a semi-supervised learning approach to automatically identify and classify cores of semantic classes in unstructured text. We perform a case study on medical text. The results show that the selected features characterize the cluster struc...

متن کامل

Detecting Concept Drift in Data Stream Using Semi-Supervised Classification

Data stream is a sequence of data generated from various information sources at a high speed and high volume. Classifying data streams faces the three challenges of unlimited length, online processing, and concept drift. In related research, to meet the challenge of unlimited stream length, commonly the stream is divided into fixed size windows or gradual forgetting is used. Concept drift refer...

متن کامل

Combining Unigrams and Bigrams in Semi-Supervised Text Classification

Unlabeled documents vastly outnumber labeled documents in text classification. For this reason, semi-supervised learning is well suited to the task. Representing text as a combination of unigrams and bigrams has not shown consistent improvements compared to using unigrams in supervised text classification. Therefore, a natural question is whether this finding extends to semi-supervised learning...

متن کامل

Delft University of Technology Projected estimators for robust semi-supervised classification

For semi-supervised techniques to be applied safely in practice we at least want methods to outperform their supervised counterparts.We study this question for classification using the well-known quadratic surrogate loss function. Unlike other approaches to semisupervised learning, the procedure proposed in this work does not rely on assumptions that are not intrinsic to the classifier at hand....

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

متن کامل
عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009